Overview

Dataset statistics

Number of variables9
Number of observations392
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.7 KiB
Average record size in memory72.3 B

Variable types

NUM8
CAT1

Reproduction

Analysis started2020-08-25 01:12:55.812310
Analysis finished2020-08-25 01:13:06.382340
Duration10.57 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

cubicInches is highly correlated with cylinders and 1 other fieldsHigh correlation
cylinders is highly correlated with cubicInchesHigh correlation
weightLbs is highly correlated with cubicInchesHigh correlation
brand has 27 (6.9%) zeros Zeros

Variables

MPG
Real number (ℝ≥0)

Distinct count127
Unique (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.445918367346938
Minimum9.0
Maximum46.6
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:06.431010image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile13
Q117
median22.75
Q329
95-th percentile37
Maximum46.6
Range37.6
Interquartile range (IQR)12

Descriptive statistics

Standard deviation7.805007487
Coefficient of variation (CV)0.3328940826
Kurtosis-0.5159934946
Mean23.44591837
Median Absolute Deviation (MAD)5.8
Skewness0.4570923231
Sum9190.8
Variance60.91814187
2020-08-25T01:13:06.547567image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13205.1%
 
14194.8%
 
18174.3%
 
15164.1%
 
26143.6%
 
16133.3%
 
19123.1%
 
24112.8%
 
28102.6%
 
25102.6%
 
22102.6%
 
2792.3%
 
2392.3%
 
2092.3%
 
2982.0%
 
3171.8%
 
1771.8%
 
2171.8%
 
3071.8%
 
3661.5%
 
1261.5%
 
3261.5%
 
17.551.3%
 
15.551.3%
 
20.241.0%
 
Other values (102)14537.0%
 
ValueCountFrequency (%) 
910.3%
 
1020.5%
 
1141.0%
 
1261.5%
 
13205.1%
 
14194.8%
 
14.510.3%
 
15164.1%
 
15.551.3%
 
16133.3%
 
ValueCountFrequency (%) 
46.610.3%
 
44.610.3%
 
44.310.3%
 
4410.3%
 
43.410.3%
 
43.110.3%
 
41.510.3%
 
40.810.3%
 
39.410.3%
 
39.110.3%
 

cylinders
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.471938775510204
Minimum3.0
Maximum8.0
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:06.680064image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4
Q14
median4
Q38
95-th percentile8
Maximum8
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.705783247
Coefficient of variation (CV)0.3117328825
Kurtosis-1.398198638
Mean5.471938776
Median Absolute Deviation (MAD)0
Skewness0.5081092403
Sum2145
Variance2.909696487
2020-08-25T01:13:06.796911image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
419950.8%
 
810326.3%
 
68321.2%
 
341.0%
 
530.8%
 
ValueCountFrequency (%) 
341.0%
 
419950.8%
 
530.8%
 
68321.2%
 
810326.3%
 
ValueCountFrequency (%) 
810326.3%
 
68321.2%
 
530.8%
 
419950.8%
 
341.0%
 

cubicInches
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count80
Unique (%)20.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean194.41326530612244
Minimum68.0
Maximum455.0
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:06.924645image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum68
5-th percentile85
Q1105
median151
Q3275.75
95-th percentile400
Maximum455
Range387
Interquartile range (IQR)170.75

Descriptive statistics

Standard deviation104.6428227
Coefficient of variation (CV)0.5382493962
Kurtosis-0.7782892557
Mean194.4132653
Median Absolute Deviation (MAD)61
Skewness0.7016875496
Sum76210
Variance10950.12034
2020-08-25T01:13:07.031431image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
97215.4%
 
350184.6%
 
98184.6%
 
250174.3%
 
318174.3%
 
140153.8%
 
400133.3%
 
225133.3%
 
91123.1%
 
232112.8%
 
302112.8%
 
121112.8%
 
15192.3%
 
12092.3%
 
35182.0%
 
23182.0%
 
9082.0%
 
20071.8%
 
10571.8%
 
8571.8%
 
30471.8%
 
12271.8%
 
7961.5%
 
11961.5%
 
15661.5%
 
Other values (55)12030.6%
 
ValueCountFrequency (%) 
6810.3%
 
7030.8%
 
7120.5%
 
7210.3%
 
7610.3%
 
7810.3%
 
7961.5%
 
8010.3%
 
8110.3%
 
8310.3%
 
ValueCountFrequency (%) 
45530.8%
 
45410.3%
 
44020.5%
 
42930.8%
 
400133.3%
 
39010.3%
 
38320.5%
 
36041.0%
 
35182.0%
 
350184.6%
 

horsepower
Real number (ℝ≥0)

Distinct count93
Unique (%)23.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.46938775510205
Minimum46.0
Maximum230.0
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:07.151582image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum46
5-th percentile60.55
Q175
median93.5
Q3126
95-th percentile180
Maximum230
Range184
Interquartile range (IQR)51

Descriptive statistics

Standard deviation38.49115993
Coefficient of variation (CV)0.3684443908
Kurtosis0.6969469997
Mean104.4693878
Median Absolute Deviation (MAD)19.5
Skewness1.087326282
Sum40952
Variance1481.569393
2020-08-25T01:13:07.254764image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
150225.6%
 
90205.1%
 
88194.8%
 
110184.6%
 
100174.3%
 
75143.6%
 
95143.6%
 
105123.1%
 
67123.1%
 
70123.1%
 
65102.6%
 
9792.3%
 
8592.3%
 
8071.8%
 
14571.8%
 
14071.8%
 
7261.5%
 
9261.5%
 
7861.5%
 
6861.5%
 
8461.5%
 
18051.3%
 
11551.3%
 
6051.3%
 
7151.3%
 
Other values (68)13333.9%
 
ValueCountFrequency (%) 
4620.5%
 
4830.8%
 
4910.3%
 
5241.0%
 
5320.5%
 
5410.3%
 
5820.5%
 
6051.3%
 
6110.3%
 
6220.5%
 
ValueCountFrequency (%) 
23010.3%
 
22530.8%
 
22010.3%
 
21530.8%
 
21010.3%
 
20810.3%
 
20010.3%
 
19820.5%
 
19310.3%
 
19030.8%
 

weightLbs
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count346
Unique (%)88.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2977.5841836734694
Minimum1613.0
Maximum5140.0
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:07.370146image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1613
5-th percentile1931.6
Q12225.25
median2803.5
Q33614.75
95-th percentile4464
Maximum5140
Range3527
Interquartile range (IQR)1389.5

Descriptive statistics

Standard deviation849.40256
Coefficient of variation (CV)0.2852656743
Kurtosis-0.8092593883
Mean2977.584184
Median Absolute Deviation (MAD)639.5
Skewness0.5195856741
Sum1167213
Variance721484.709
2020-08-25T01:13:07.481954image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
213041.0%
 
198541.0%
 
212530.8%
 
294530.8%
 
226530.8%
 
272030.8%
 
215530.8%
 
230030.8%
 
195020.5%
 
394020.5%
 
293020.5%
 
193720.5%
 
342520.5%
 
211020.5%
 
206520.5%
 
240820.5%
 
367220.5%
 
372520.5%
 
182520.5%
 
199020.5%
 
267020.5%
 
196520.5%
 
204520.5%
 
216420.5%
 
341020.5%
 
Other values (321)33284.7%
 
ValueCountFrequency (%) 
161310.3%
 
164910.3%
 
175510.3%
 
176010.3%
 
177310.3%
 
179520.5%
 
180020.5%
 
182520.5%
 
183410.3%
 
183510.3%
 
ValueCountFrequency (%) 
514010.3%
 
499710.3%
 
495510.3%
 
495210.3%
 
495110.3%
 
490610.3%
 
474610.3%
 
473510.3%
 
473210.3%
 
469910.3%
 

time-to-sixty
Real number (ℝ≥0)

Distinct count17
Unique (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.681122448979592
Minimum8.0
Maximum25.0
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:07.594885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile11
Q114
median16
Q317
95-th percentile20
Maximum25
Range17
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.761231566
Coefficient of variation (CV)0.1760863468
Kurtosis0.5026648441
Mean15.68112245
Median Absolute Deviation (MAD)2
Skewness0.3030136101
Sum6147
Variance7.62439976
2020-08-25T01:13:07.708347image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
166416.3%
 
156416.3%
 
144912.5%
 
174712.0%
 
13358.9%
 
19297.4%
 
18297.4%
 
12215.4%
 
11133.3%
 
20123.1%
 
2182.0%
 
2271.8%
 
1061.5%
 
930.8%
 
2420.5%
 
2520.5%
 
810.3%
 
ValueCountFrequency (%) 
810.3%
 
930.8%
 
1061.5%
 
11133.3%
 
12215.4%
 
13358.9%
 
144912.5%
 
156416.3%
 
166416.3%
 
174712.0%
 
ValueCountFrequency (%) 
2520.5%
 
2420.5%
 
2271.8%
 
2182.0%
 
20123.1%
 
19297.4%
 
18297.4%
 
174712.0%
 
166416.3%
 
156416.3%
 

year
Real number (ℝ≥0)

Distinct count13
Unique (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1976.9795918367347
Minimum1971.0
Maximum1983.0
Zeros0
Zeros (%)0.0%
Memory size3.2 KiB
2020-08-25T01:13:07.824139image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1971
5-th percentile1971
Q11974
median1977
Q31980
95-th percentile1983
Maximum1983
Range12
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.683736544
Coefficient of variation (CV)0.001863315412
Kurtosis-1.16744622
Mean1976.979592
Median Absolute Deviation (MAD)3
Skewness0.01968829963
Sum774976
Variance13.56991492
2020-08-25T01:13:07.926242image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19744010.2%
 
1979369.2%
 
1977348.7%
 
1983307.7%
 
1976307.7%
 
1980297.4%
 
1971297.4%
 
1982287.1%
 
1973287.1%
 
1978287.1%
 
1981276.9%
 
1972276.9%
 
1975266.6%
 
ValueCountFrequency (%) 
1971297.4%
 
1972276.9%
 
1973287.1%
 
19744010.2%
 
1975266.6%
 
1976307.7%
 
1977348.7%
 
1978287.1%
 
1979369.2%
 
1980297.4%
 
ValueCountFrequency (%) 
1983307.7%
 
1982287.1%
 
1981276.9%
 
1980297.4%
 
1979369.2%
 
1978287.1%
 
1977348.7%
 
1976307.7%
 
1975266.6%
 
19744010.2%
 

brand
Real number (ℝ≥0)

ZEROS

Distinct count30
Unique (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.323979591836734
Minimum0
Maximum29
Zeros27
Zeros (%)6.9%
Memory size3.2 KiB
2020-08-25T01:13:08.036376image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q16
median11
Q321
95-th percentile29
Maximum29
Range29
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.558786423
Coefficient of variation (CV)0.6423596167
Kurtosis-1.027448284
Mean13.32397959
Median Absolute Deviation (MAD)5
Skewness0.3061704322
Sum5223
Variance73.25282504
2020-08-25T01:13:08.152804image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114812.2%
 
64712.0%
 
21317.9%
 
9287.1%
 
0276.9%
 
26266.6%
 
8235.9%
 
29225.6%
 
3174.3%
 
22164.1%
 
13133.3%
 
14123.1%
 
16112.8%
 
18102.6%
 
1082.0%
 
2082.0%
 
171.8%
 
2861.5%
 
761.5%
 
1941.0%
 
2541.0%
 
2441.0%
 
2330.8%
 
1530.8%
 
420.5%
 
Other values (5)61.5%
 
ValueCountFrequency (%) 
0276.9%
 
171.8%
 
220.5%
 
3174.3%
 
420.5%
 
510.3%
 
64712.0%
 
761.5%
 
8235.9%
 
9287.1%
 
ValueCountFrequency (%) 
29225.6%
 
2861.5%
 
2710.3%
 
26266.6%
 
2541.0%
 
2441.0%
 
2330.8%
 
22164.1%
 
21317.9%
 
2082.0%
 

target
Categorical

Distinct count3
Unique (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
2
245
1
79
0
68
ValueCountFrequency (%) 
224562.5%
 
17920.2%
 
06817.3%
 
2020-08-25T01:13:08.492481image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
224562.5%
 
17920.2%
 
06817.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number392100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
224562.5%
 
17920.2%
 
06817.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common392100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
224562.5%
 
17920.2%
 
06817.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII392100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
224562.5%
 
17920.2%
 
06817.3%
 

Interactions

2020-08-25T01:12:56.191614image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:56.348462image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:56.496978image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:56.649381image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:56.798478image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:56.945629image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:57.095057image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:57.257372image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:57.450687image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:57.616731image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:57.743433image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:57.883756image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.018548image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.156434image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.292390image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.422251image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.555428image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.705030image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:58.857252image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.015779image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.181819image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.345766image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.507307image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.652298image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.798900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:59.944781image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:00.084460image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:00.417466image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:00.564420image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:00.711922image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:00.859249image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:00.996576image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:01.138269image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:01.284709image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:01.428766image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:01.581643image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:01.739719image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:01.892866image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.040496image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.181272image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.322792image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.471798image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.610440image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.761566image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:02.912253image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.060425image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.209954image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.351978image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.497069image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.636934image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.764246image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:03.917364image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:04.055666image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:04.195382image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:04.335683image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:04.471402image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:04.804764image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:04.946527image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:05.077761image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:05.223449image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:05.364755image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:05.510508image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:05.652548image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:05.786638image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:13:08.618869image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:13:08.837565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:13:09.068149image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:13:09.285634image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:13:06.025322image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:06.276264image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

MPGcylinderscubicIncheshorsepowerweightLbstime-to-sixtyyearbrandtarget
014.08.0350.0165.04209.012.01972.062
131.94.089.071.01925.014.01980.0290
217.08.0302.0140.03449.011.01971.0112
315.08.0400.0150.03761.010.01971.062
430.54.098.063.02051.017.01978.062
523.08.0350.0125.03900.017.01980.042
613.08.0351.0158.04363.013.01974.0112
714.08.0440.0215.04312.09.01971.0212
825.45.0183.077.03530.020.01980.0150
937.74.089.062.02050.017.01982.0261

Last rows

MPGcylinderscubicIncheshorsepowerweightLbstime-to-sixtyyearbrandtarget
38217.68.0302.0129.03725.013.01980.0112
38319.03.070.097.02330.014.01973.0141
38436.04.079.058.01825.019.01978.0230
38533.04.0105.074.02190.014.01982.0290
38625.04.0113.095.02228.014.01972.0261
38725.54.0122.096.02300.016.01978.0212
38821.06.0155.0107.02472.014.01974.0162
38911.08.0318.0210.04382.014.01971.092
39017.06.0163.0125.03140.014.01979.0280
39136.04.0105.074.01980.015.01983.0290